Statistical Methods For Retrieving Most Significant Paragraphs In Newspaper Articles

نویسندگان

  • Jose Abracos
  • Gabriel Lopes
چکیده

Retrieving a most stgulficant paragraph m a newspaper arUcle can act as a kind of surnmanzatmn It can gwe the human reader some hints on the contents of the arucle and help him to decide whether It deseei'ves a full readmg or not It may also act as a filter for a robust natural language understanding system, to extract relevant mformatton from that paragraph m order to enable conceptual mformauon retrieval Talang a newspaper arUcle and a base corpus, word co-occurrences w3th higher resolving power are ~dent~fied These co-occurrences are used to estabhsh hnks between the paragraphs of the arUcle The paragraph which presents the larger number of hnks tO other paragraphs ~s considered a most slgmficant one Though designed and tested for the Portuguese language, the staUshcal nature of our proposal should ensure ns portabtlny to other languages

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using lexical chains to build hypertext links in newspaper articles

We discuss an automatic method for the construction of hypertext links within and between newspaper articles. The method comprises three steps: determining the lexical chains in a text, building links between the paragraphs of articles, and building links between articles. Lexical chains capture the semantic relations between words that occur throughout a text. Each chain is a set of related wo...

متن کامل

Building hypertext links in newspaper articles using semantic similarity

We discuss an automatic method for the construction of hypertext links within and between newspaper articles. The method comprises three steps: determining the lexical chains in a text, building links between the paragraphs of articles, and building links between articles. Lexical chains capture the semantic relations between words that occur throughout a text. Each chain is a set of related wo...

متن کامل

A Comparative Study of Topic Identification on Newspaper and E-mail

This paper presents several statistical methods for topic identification on two kinds of textual data: newspaper articles and e-mails. Five methods are tested on these two corpora: topic unigrams, cache model, TFIDF classifier, topic perplexity, and weighted model. Our work aims to study these methods by confronting them to very different data. This study is very fruitful for our research. Stat...

متن کامل

Automatically generating hypertext in newspaper articles by computing semantic relatedness

We discuss an automatic method for the construction of hypertext links within and between newspaper articles. The method comprises three steps: determining the lexical chains in a text, building links between the paragraphs of articles, and building links between articles. Lexical chains capture the semantic relations between words that occur throughout a text. Each chain is a set of related wo...

متن کامل

Extraction and Visualization of Trend Information from Newspaper Articles and Blogs

Trend information is a summarization of temporal statistical data, such as changes in product prices and sales. We propose a method for extracting trend information from multiple newspaper articles and blogs, and visualizing the information as graphs. As target texts for extraction of trend information, the MuST (Multimodal Summarization for Trend Information) workshop focuses on newspaper arti...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997